{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Neeps - Easy\n",
    "\n",
    "The \"Neeps\" database includes details of all teaching events in the School of Computing at Napier University in Semester 1 of the 2000/2001 academic year.\n",
    "\n",
    "Graduated questions\n",
    "\n",
    "The timetable database contains the following tables:\n",
    "\n",
    "- ut_staff\n",
    "\n",
    "id | name\n",
    "---|-----\n",
    "co.ACg | Cumming, Andrew\n",
    "co.ACr | Crerar, Dr Alison\n",
    "co.AFA | Armitage, Dr Alistair\n",
    " | ... Result truncated.\n",
    "\n",
    "The staff table simply gives the unique identifier for each member of staff together with their full name.\n",
    "\n",
    "- ut_student\n",
    "\n",
    "id | name | sze | parent\n",
    "---|------|----:|----------\n",
    "co1.BAe | BAe | 15 | \n",
    "co1.CO | Computing 1st Year | 150 | \n",
    "co1.CO.a | Computing 1st Year a | 20 | co1.CO\n",
    "co1.CO.c | Computing 1st Year c | 20 | co1.CO\n",
    "co1.CO.d | Computing 1st Year d | 20 | co1.CO\n",
    " | | | ... Result truncated.\n",
    "\n",
    "Each entry in the student table represents a group of students who follow the same timetable. The parent field is used to maintain a hierachy of groups. A student in group co1.CO.a is also in group co1.CO\n",
    "\n",
    "- ut_room\n",
    "\n",
    "id | name | capacity | parent\n",
    "---|------|---------:|---------\n",
    "co.117+118 |  | 32 | \n",
    "co.117 |  | 16 | co.117+118\n",
    "co.118 |  | 16 | co.117+118\n",
    "co.B7 |  | 40 | \n",
    " | | | ... result truncated\n",
    "\n",
    "All classrooms are listed in the room table. The parent relation is used to maintain groupings. The row co.117+118 represents a pair of rooms - sometimes these rooms are used for a single event.\n",
    "\n",
    "- ut_event\n",
    "\n",
    "id | modle | kind | dow | tod | duration | room\n",
    "---|-------|------|-----|-----|---------:|----------\n",
    "co12004.L01 | co12004 | L | Wednesday | 11:00 | 1 | cr.SMH\n",
    "co12004.L02 | co12004 | L | Monday | 17:00 | 1 | cr.B13\n",
    "co12004.T01 | co12004 | T | Monday | 11:00 | 2 | co.G78+G82\n",
    "co12004.T02 | co12004 | T | Tuesday | 15:00 | 2 | co.B7\n",
    "co12004.T03 | co12004 | T | Tuesday | 13:00 | 2 | co.G78+G82\n",
    " | | | | | | ... Result truncated.\n",
    "\n",
    "Every event is associated with a single modle (deliberately mis-spelled version of module - which is a reserved word in one of engines). An event occurs on a particular day of the week (dow) at a particular time of day (tod). Every event has a duration specified as a whole number of hours.\n",
    "\n",
    "- ut_attends\n",
    "\n",
    "student | event\n",
    "--------|------\n",
    "co1.BAe | co12004.L01\n",
    "co1.CO | co12004.L01\n",
    "co1.IS | co12004.L01\n",
    "co1.SE.pt | co12004.L02\n",
    "co1.CO.a | co12004.T01\n",
    " | ... Result truncated.\n",
    "\n",
    "The attends table links event to student. It realises the many to many relation.\n",
    "\n",
    "- ut_teaches\n",
    "\n",
    "staff | event\n",
    "------|----------\n",
    "co.AMn | co12004.L01\n",
    "co.RK | co12004.L01\n",
    "co.SRM | co12004.L01\n",
    "co.RK | co12004.L02\n",
    "co.SRM | co12004.L02\n",
    " | ... Result truncated.\n",
    "\n",
    "The teaches table links event to staff.\n",
    "\n",
    "- ut_occurs\n",
    "\n",
    "event | week\n",
    "------|-------:\n",
    "co12004.L01 | 01\n",
    "co12004.L01 | 02\n",
    "co12004.L01 | 03\n",
    "co12004.L01 | 04\n",
    "co12004.L01 | 05\n",
    " | ... Result truncated.\n",
    "\n",
    "Most events occur on weeks 01 to 13. This table records exactly which weeks are part of the plan for each event.\n",
    "\n",
    "- ut_modle\n",
    "\n",
    "id | name\n",
    "---|---------\n",
    "co12001 | Rapid Application Development\n",
    "co12002 | Software Development 1A\n",
    "co12003 | Professional Skills\n",
    "co12004 | Applications Workshop\n",
    " | ... Result truncated.\n",
    "\n",
    "By the 21st century, the Gradgrindians had completed their assimilation of Education. The process of \"modularisation\" was just one of their crimes against learning.\n",
    "\n",
    "- ut_week\n",
    "\n",
    "id | wkstart\n",
    "--:|-------:\n",
    "01 | 02/10/00\n",
    "02 | 09/10/00\n",
    "03 | 16/10/00\n",
    " | ...Result truncated\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Loading spark-stubs, spark-hive\n",
      "Adding Hive conf dir /opt/hive/conf to classpath\n",
      "Creating SparkSession\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "SLF4J: No SLF4J providers were found.\n",
      "SLF4J: Defaulting to no-operation (NOP) logger implementation\n",
      "SLF4J: See https://www.slf4j.org/codes.html#noProviders for further details.\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<a target=\"_blank\" href=\"http://jupyter:4040\">Spark UI</a>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "\u001b[32mimport \u001b[39m\u001b[36m$ivy.$                                  \n",
       "\n",
       "\u001b[39m\n",
       "\u001b[32mimport \u001b[39m\u001b[36morg.apache.log4j.{Level, Logger}\n",
       "\u001b[39m\n",
       "\u001b[32mimport \u001b[39m\u001b[36morg.apache.spark._\n",
       "\u001b[39m\n",
       "\u001b[32mimport \u001b[39m\u001b[36morg.apache.spark.sql._\n",
       "\u001b[39m\n",
       "\u001b[32mimport \u001b[39m\u001b[36morg.apache.spark.sql.types._\n",
       "\u001b[39m\n",
       "\u001b[32mimport \u001b[39m\u001b[36morg.apache.spark.sql.functions._\n",
       "\u001b[39m\n",
       "\u001b[32mimport \u001b[39m\u001b[36morg.apache.spark.sql.expressions.Window\n",
       "\n",
       "\u001b[39m\n",
       "\u001b[36mspark\u001b[39m: \u001b[32mSparkSession\u001b[39m = org.apache.spark.sql.SparkSession@5c779403"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import $ivy.`org.apache.spark::spark-sql:3.4.0`\n",
    "\n",
    "import org.apache.log4j.{Level, Logger}\n",
    "Logger.getLogger(\"org\").setLevel(Level.OFF)\n",
    "\n",
    "import org.apache.spark._\n",
    "import org.apache.spark.sql._\n",
    "import org.apache.spark.sql.types._\n",
    "import org.apache.spark.sql.functions._\n",
    "import org.apache.spark.sql.expressions.Window\n",
    "\n",
    "val spark = {\n",
    "    NotebookSparkSession.builder()\n",
    "    .progress(false)\n",
    "    .appName(\"app15-1\")\n",
    "    // .master(\"spark://192.168.31.31:7077\")\n",
    "    .master(\"local[*]\")\n",
    "    .config(\"spark.sql.warehouse.dir\", \n",
    "            \"hdfs://192.168.31.31:9000/user/hive/warehouse\") \n",
    "    .config(\"spark.cores.max\", \"4\") \n",
    "    .config(\"spark.executor.instances\", \"1\") \n",
    "    .config(\"spark.executor.cores\", \"2\") \n",
    "    .config(\"spark.executor.memory\", \"10g\") \n",
    "    .config(\"spark.shuffle.service.enabled\", \"false\") \n",
    "    .config(\"spark.dynamicAllocation.enabled\", \"false\") \n",
    "    .config(\"spark.sql.catalogImplementation\", \"hive\")\n",
    "    .config(\"spark.sql.repl.eagerEval.enabled\", \"true\")\n",
    "    .config(\"spark.driver.allowMultipleContexts\", \"true\")\n",
    "    .getOrCreate()\n",
    "}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "\u001b[32mimport \u001b[39m\u001b[36mspark.implicits._\n",
       "\u001b[39m\n",
       "defined \u001b[32mfunction\u001b[39m \u001b[36msc\u001b[39m\n",
       "\u001b[36mhiveCxt\u001b[39m: \u001b[32msql\u001b[39m.\u001b[32mhive\u001b[39m.\u001b[32mHiveContext\u001b[39m = org.apache.spark.sql.hive.HiveContext@66031ebd"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import spark.implicits._\n",
    "def sc = spark.sparkContext\n",
    "val hiveCxt = new org.apache.spark.sql.hive.HiveContext(sc)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "defined \u001b[32mclass\u001b[39m \u001b[36mRichDF\u001b[39m"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "// Credit to Aivean\n",
    "implicit class RichDF(val ds:DataFrame) {\n",
    "    def showHTML(limit: Int = 50, truncate: Int = 100) = {\n",
    "        import xml.Utility.escape\n",
    "        val data = ds.take(limit)\n",
    "        val header = ds.schema.fieldNames.toSeq        \n",
    "        val rows: Seq[Seq[String]] = data.map { row =>\n",
    "          row.toSeq.map {cell =>\n",
    "            val str = cell match {\n",
    "              case null => \"null\"\n",
    "              case binary: Array[Byte] => binary.map(\"%02X\".format(_)).mkString(\"[\", \" \", \"]\")\n",
    "              case array: Array[_] => array.mkString(\"[\", \", \", \"]\")\n",
    "              case seq: Seq[_] => seq.mkString(\"[\", \", \", \"]\")\n",
    "              case _ => cell.toString\n",
    "            }\n",
    "            if (truncate > 0 && str.length > truncate) {\n",
    "              // do not show ellipses for strings shorter than 4 characters.\n",
    "              if (truncate < 4) str.substring(0, truncate)\n",
    "              else str.substring(0, truncate - 3) + \"...\"\n",
    "            } else {\n",
    "              str\n",
    "            }\n",
    "          }: Seq[String]\n",
    "        }\n",
    "    publish.html(s\"\"\" <table>\n",
    "                <tr>\n",
    "                 ${header.map(h => s\"<th>${escape(h)}</th>\").mkString}\n",
    "                </tr>\n",
    "                ${rows.map {row =>\n",
    "                  s\"<tr>${row.map{c => s\"<td>${escape(c)}</td>\" }.mkString}</tr>\"\n",
    "                }.mkString}\n",
    "            </table>\n",
    "        \"\"\")\n",
    "    }\n",
    "}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "\u001b[36mut_staff\u001b[39m: \u001b[32mDataFrame\u001b[39m = [id: string, name: string]\n",
       "\u001b[36mut_student\u001b[39m: \u001b[32mDataFrame\u001b[39m = [id: string, name: string ... 2 more fields]\n",
       "\u001b[36mut_event\u001b[39m: \u001b[32mDataFrame\u001b[39m = [id: string, modle: string ... 5 more fields]\n",
       "\u001b[36mut_room\u001b[39m: \u001b[32mDataFrame\u001b[39m = [id: string, name: string ... 2 more fields]\n",
       "\u001b[36mut_attends\u001b[39m: \u001b[32mDataFrame\u001b[39m = [student: string, event: string]\n",
       "\u001b[36mut_teaches\u001b[39m: \u001b[32mDataFrame\u001b[39m = [staff: string, event: string]\n",
       "\u001b[36mut_occurs\u001b[39m: \u001b[32mDataFrame\u001b[39m = [event: string, week: string]\n",
       "\u001b[36mut_modle\u001b[39m: \u001b[32mDataFrame\u001b[39m = [id: string, name: string]\n",
       "\u001b[36mut_week\u001b[39m: \u001b[32mDataFrame\u001b[39m = [id: string, wkstart: string]"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "val ut_staff = hiveCxt.table(\"sqlzoo.ut_staff\")\n",
    "val ut_student = hiveCxt.table(\"sqlzoo.ut_student\")\n",
    "val ut_event = hiveCxt.table(\"sqlzoo.ut_event\")\n",
    "val ut_room = hiveCxt.table(\"sqlzoo.ut_room\")\n",
    "val ut_attends = hiveCxt.table(\"sqlzoo.ut_attends\")\n",
    "val ut_teaches = hiveCxt.table(\"sqlzoo.ut_teaches\")\n",
    "val ut_occurs = hiveCxt.table(\"sqlzoo.ut_occurs\")\n",
    "val ut_modle = hiveCxt.table(\"sqlzoo.ut_modle\")\n",
    "val ut_week = hiveCxt.table(\"sqlzoo.ut_week\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1.\n",
    "**Give the room id in which the event co42010.L01 takes place.**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       " <table>\n",
       "                <tr>\n",
       "                 <th>room</th>\n",
       "                </tr>\n",
       "                <tr><td>cr.132</td></tr>\n",
       "            </table>\n",
       "        "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "(ut_event.filter(ut_event(\"id\")===\"co42010.L01\")\n",
    " .select(\"room\")\n",
    " .showHTML())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2.\n",
    "**For each event in module co72010 show the day, the time and the place.**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       " <table>\n",
       "                <tr>\n",
       "                 <th>dow</th><th>tod</th><th>room</th>\n",
       "                </tr>\n",
       "                <tr><td>Wednesday</td><td>14:00</td><td>cr.SMH</td></tr><tr><td>Tuesday</td><td>09:00</td><td>cr.B8</td></tr><tr><td>Wednesday</td><td>09:00</td><td>co.B7</td></tr><tr><td>Tuesday</td><td>12:00</td><td>co.LB42+LB46</td></tr><tr><td>Tuesday</td><td>11:00</td><td>co.G75+G76</td></tr><tr><td>Wednesday</td><td>16:00</td><td>co.LB42+LB46</td></tr><tr><td>Thursday</td><td>10:00</td><td>co.LB42+LB46</td></tr><tr><td>Wednesday</td><td>13:00</td><td>co.117+118</td></tr>\n",
       "            </table>\n",
       "        "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "(ut_event.filter(ut_event(\"modle\")===\"co72010\")\n",
    " .select(\"dow\", \"tod\", \"room\")\n",
    " .showHTML())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3.\n",
    "**List the names of the staff who teach on module co72010.**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       " <table>\n",
       "                <tr>\n",
       "                 <th>name</th>\n",
       "                </tr>\n",
       "                <tr><td>Cumming, Andrew</td></tr><tr><td>Chisholm, Ken</td></tr>\n",
       "            </table>\n",
       "        "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "(ut_event.filter(ut_event(\"modle\")===\"co72010\")\n",
    " .join(ut_teaches, (ut_event(\"id\")===ut_teaches(\"event\")))\n",
    " .join(ut_staff, (ut_teaches(\"staff\")===ut_staff(\"id\")))\n",
    " .select(\"name\")\n",
    " .distinct()\n",
    " .showHTML())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4.\n",
    "**Give a list of the staff and module number associated with events using room cr.132 on Wednesday, include the time each event starts.**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       " <table>\n",
       "                <tr>\n",
       "                 <th>name</th><th>modle</th>\n",
       "                </tr>\n",
       "                <tr><td>Varey, Alison</td><td>co32021</td></tr><tr><td>Murray, Jim</td><td>co22009</td></tr>\n",
       "            </table>\n",
       "        "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "(ut_event.filter((ut_event(\"room\")===\"cr.132\") && \n",
    "                 (ut_event(\"dow\")===\"Wednesday\"))\n",
    " .join(ut_teaches, (ut_event(\"id\")===ut_teaches(\"event\")))\n",
    " .join(ut_staff, (ut_teaches(\"staff\")===ut_staff(\"id\")))\n",
    " .select(\"name\", \"modle\")\n",
    " .showHTML())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5.\n",
    "**Give a list of the student groups which take modules with the word 'Database' in the name.**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       " <table>\n",
       "                <tr>\n",
       "                 <th>student_group</th>\n",
       "                </tr>\n",
       "                <tr><td>BEng4 Network and Distributing Systems</td></tr><tr><td>PgD Information Systems e (OOP)</td></tr><tr><td>PgD Information Systems</td></tr><tr><td>BSc4 Computing</td></tr><tr><td>PgD Software Engineering</td></tr><tr><td>null</td></tr><tr><td>PgD Software Technology pt. Tues</td></tr><tr><td>PgD Information Systems a (HCI)</td></tr><tr><td>PgD Information Systems b (DS)</td></tr><tr><td>PgD Information Systems d (BT)</td></tr><tr><td>PgD Information System pt. Tues</td></tr><tr><td>PgD IT and e-Commerce eve sem 3</td></tr><tr><td>PgD IT and e-Commerce</td></tr><tr><td>PgD Software Technology</td></tr>\n",
       "            </table>\n",
       "        "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "(ut_student\n",
    " .withColumnRenamed(\"name\", \"student_group\")\n",
    " .join(ut_attends, (ut_student(\"id\")===ut_attends(\"student\")))\n",
    " .join(ut_event, (ut_attends(\"event\")===ut_event(\"id\")))\n",
    " .join(ut_modle.filter(lower(col(\"name\")).like(\"%database%\")), \n",
    "       (ut_event(\"modle\")===ut_modle(\"id\")))\n",
    " .select(\"student_group\")\n",
    " .distinct()\n",
    " .showHTML())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "spark.stop()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Scala",
   "language": "scala",
   "name": "scala"
  },
  "language_info": {
   "codemirror_mode": "text/x-scala",
   "file_extension": ".sc",
   "mimetype": "text/x-scala",
   "name": "scala",
   "nbconvert_exporter": "script",
   "version": "2.12.15"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
